A High Performance Redundancy Scheme for Cluster File Systems
نویسندگان
چکیده
A known problem in the design of striped file systems is their vulnerability to disk failures. In this paper we address the challenges of augmenting an existing file system with traditional RAID redundancy, and we propose a novel hybrid redundancy scheme designed to maximize disk throughput as seen by the applications. To demonstrate the hybrid redundancy scheme we build CSAR, a proof-of-concept implementation based on PVFS, and test its performance using both microbenchmarks and representative scientific applications. While either one of traditional schemes such as RAID1 and RAID5 can deliver the highest bandwidth depending on the nature of the load, the hybrid scheme consistently performs as the best of the two. The application-dependent, potentially larger storage occupation of our scheme is justified by current technological trends that put I/O bandwidth at a premium over disk space.
منابع مشابه
CSAR: Cluster Storage with Adaptive Redundancy
Striped file systems such as the Parallel Virtual File System (PVFS) deliver high-bandwidth I/O to applications running on clusters. An open problem of existing striped file systems is how to provide efficient data redundancy to decrease their vulnerability to disk failures. In this paper we describe CSAR, a version of PVFS augmented with a novel redundancy scheme that addresses the efficiency ...
متن کاملCSAR-2: A Case Study of Parallel File System Dependability Analysis
Modern cluster file systems such as PVFS that stripe files across multiple nodes have shown to provide high aggregate I/O bandwidth but are prone to data loss since the failure of a single disk or server affects the whole file system. To address this problem a number of distributed data redundancy schemes have been proposed that represent different trade-offs between performance, storage effici...
متن کاملCold standby redundancy optimization for nonrepairable series-parallel systems: Erlang time to failure distribution
In modeling a cold standby redundancy allocation problem (RAP) with imperfect switching mechanism, deriving a closed form version of a system reliability is too difficult. A convenient lower bound on system reliability is proposed and this approximation is widely used as a part of objective function for a system reliability maximization problem in the literature. Considering this lower bound do...
متن کاملCSAR-2: a Case Study of Parallel File System Dependability
Modern cluster file systems such as PVFS that stripe files across multiple nodes have shown to provide high aggregate I/O bandwidth but are prone to data loss since the failure of a single disk or server affects the whole file system. To address this problem a number of distributed data redundancy schemes have been proposed that represent different trade-offs between performance, storage effici...
متن کاملResearch Directions in Parallel I/O for Clusters
Parallel I/O remains a critical problem for cluster computing. A significant number of important applications need high performance parallel I/O and most cluster systems provide enough hardware to deliver the required performance. System software for achieving the desired goals remains in the research and development stage. A number of parallel file systems have achieved remarkable goals in one...
متن کامل